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In  Olson,  Wright,  and  McKell's  recent  paper  on  the  design  of  oil  pollution 
detection  schedules,  an  interesting  and  inventive  development  and  application 
of  a Markov  Decision  Process  was  presented.  Optimal  schedules  for  patrol 
flights  of  surveillance  aircraft  were  found  using  linear  programming.  In  this 
paper  the  model  has  been  reformulated  as  a discrete  time  semi-Markov  process. 
Significant  computational  advantages  accrue  from  this  alternative  approach. 


Abstract 

An  Efficient  Computational  Alternative  To 
'Using  Linear  Programming  to  Design 
Oil  Pollution  Detection  Schedules' 

by 

Lee  E.  Daniel,  Jr. 

Sandal  S.  Hart 

Thom  J . Hodgson 

In  Olson,  Wright,  and  McKell's  recent  paper  on  the  design  of  oil  pollution 
detection  schedules,  an  interesting  and  inventive  development  and  application 
of  a Markov  Decision  Process  was  presented.  Optimal  schedules  for  patrol 
flights  of  surveillance  aircraft  were  found  using  linear  programming.  In  this 
paper  the  model  has  been  reformulated  as  a discrete  time  semi -Markov  process. 
Significant  computational  advantages  accrue  from  this  alternative  approach. 
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Olson,  Wright,  and  McKell  [6],  recently  reported  on  a very  interesting 
application  of  the  linear  programming  formulation  for  Markov  decision  processes. 

The  U.  S.  Coast  Guard  is  charged  with  the  responsibility  for  environmental 
protection  in  our  coastal  areas.  Specifically,  they  are  chartered  to  operate 
electronic  sensor  equipped  patrol  aircraft  whose  mission,  in  this  case,  is  the 
detection  and  prevention  of  oil  and  hazardous  material  pollution  in  coastal  and 
offshore  areas.  A search  model  was  developed  for  scheduling  patrol  flights. 

This  model  is  part  of  a system  called  Pollution  Detection  and  Prevention  System 
(PDAPS) . The  objective  of  the  model  is  to  find  a flight  plan  or  schedule  which 
maximizes  the  expected  number  of  pollution  detections  per  patrol  flight.  Each 
flight  may  cover  a given  number  of  known  geographical  sectors  where  pollution 
is  likely  to  occur.  The  probability  of  a pollution  incident  occurring  in  a 
geographical  area  is  obtained  from  historical  pollution  statistics,  shipping 
statistics,  and  pollution  prediction  models  which  are  contained  in  PDAPS.  It  may 
not  be  possible  to  search  all  sectors  of  interest  in  one  flight.  Of  those  sectors 
which  are  searched,  there  may  be  multiple  possible  flight  patterns  depending  on 
physical  properties  of  the  sector  and  flight  altitude.  For  example,  three  flight 
patterns  are  shown  for  a sector  in  figure  1.  The  detection  probability  for  a 
sector  varies  according  to  the  pattern  flown. 

In  addition  to  maximizing  the  expected  number  of  pollution  incidents  detected, 
the  pollution  flight  schedules  include  a randomness  factor  in  order  to  have  a 
preventive  effect  on  intentional  polluters.  By  this  we  mean  that  schedule 
generation  is  performed  in  two  stages.  First,  an  optimal  "expected  value"schedule  is 
generated.  Second,  each  time  an  actual  flight  is  made,  a schedule  is  generated 
randomly  from  the  "expected  value"  schedule.  The  amount  of  "randomness"  in  the 
actual  schedule  is  related  to  a randomness  factor  e,  to  be  defined  shortly. 
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Figure  2 

Three  Different  Flight  Patterns 
for  a Geographical  Pollution  Sector 
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States  of  the  system,  for  modelling  purposes,  are  described  by  a 3-tuple 
in  which  the  first  entry  designates  the  geographical  sector,  the  second  indicates 
the  sector  exit  point  (i.e.  the  flight  pattern  used  to  search  the  sector);  and 
the  third  gives  the  remaining  flight  time.  It  is  clear  that  the  state  defini- 
tion has  the  Markov  property,  in  that  for  any  given  system  state  (i.e.,  the 
location  of  the  aircraft  and  the  remaining  flight  time)  the  definition  includes 
the  information  necessary  to  plan  the  rest  of  a flight.  The  reader  may  note  that 
trie  definition,  however,  does  allow  revisiting  of  geographic  sectors  (i.e., 
subtours).  This  shortcoming  is  easily  eliminated  in  practice.  The  reader  is 
referred  to  [5,  6]  for  more  detail.  Included  in  the  set  of  states  are  states 
representing  both  the  beginning  and  the  end  of  the  flight  (normally  the  same 
location,  but  not  necessarily). 

We  now  move  to  a linear  programming  formulation  of  the  patrol  flight 
scheduling  problem.  The  objective  of  the  linear  program  is,  for  a given  ran- 
domness factor  e,  to  maximize  the  expected  number  of  pollution  detections. 

Note  that  in  the  following  discussion  the  3-tuple  system  state  variable  is 
represented  by  a single  dimensioned  variable.  Define  as  the  set  of  possible 
successor  states  associated  with  state  i.  If  S denotes  the  set  of  all  states  then 


S = u S2  u •••  u Sjj  where  N is  the  number  of  states.  Let  M denote  the 

total  flight  time  available  for  a patrol  mission,  is  the  total  flight 

time  associated  with  transition  from  state  i to  state  j . Pdj  , is  the 

ij 

probability  of  detecting  a pollution  incident  associated  with  the  transition 
from  state  i to  state  j.  The  introduction  of  randomness  into  the  flight 
schedules  is  accomplished  in  the  following  manner.  Let  q^(a)  be  the 
probability  of  going  to  state  j given  that  the  current  state  is  i and  it  is 
desired  to  transition  to  state  a.  This  is  related  to  the  randomness  factor  e as 
follows: 
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represents  the  expected  number  of  detections  from  state  i given  that  the  decision 
is  to  go  to  a.  Let  z.  be  the  probability  of  being  in  state  i and  choosing  to  go 
to  state  a. 

The  final  linear  programming  model  (equivalent  to  model  II  in  [6])  is: 

Find  {z  } ieS , aeS  in  order  to 
Id  1 

max  J | cj  z , 

IeS  aeSi  1 

Subject  to 

l l ‘la  " 1 


ieS  aeS, 


where 


l l Zia(<514  " = 0»  311  J£S* 

ieS  aeS±  j iJ 


0 . i * j 


iJ 


1 , i = j 


It  should  be  noted  that  the  linear  programming  formulation  of  Olson,  et  al.  [6], 
is  equivalent  to  that  of  Derman  [1],  Wolfe  and  Dantzig  [8],  and  Manne  [4]  for 
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Markov  decision  processes.  The  structure  of  the  linear  program  insures  that  one 
and  only  one  policy  is  chosen  for  each  state  of  the  system.  The  execution  time 
for  solving  this  system  as  a linear  program  is  quite  large,  particularly  for 
problems  of  practical  size  (see  Table  1) . 


In  reviewing  the  process,  it  can  be  observed  that,  disregarding  the 
remaining  flight  time,  the  process  of  going  from  sector  to  sector  is  Markovian. 
Therefore,  the  process  can  be  viewed  as  a discrete  time  semi-Markov  process 
where  the  state  of  the  system  is  described  by  a 2-tuple  in  which  the  first 
entry  is  the  geographic  sector  and  the  second  indicates  the  sector  exit  point 
(flight  pattern).  The  time  between  transitions  is  the  transit  time  between 
serf  axit  points.  The  semi-Markov  structure  results  in  a considerable 

n in  the  number  of  states  of  the  system.  Since  the  problem  has  a finite 
ing  horizon  (length  of  the  flight),  dynamic  programming  appears  to  be  the 
appropriate  solution  technique.  From  Howard  f 3 J , the  general  form  of  the  finite 
horizon, discrete  time  semi-Markov  dynamic  recursion  is  (ignoring  boundary  conditions): 

a N n 

v (n)  = max  {c  + J q (a)  £ v (n  - m)  h.  (m)},  (1) 

a j=l  m=l  3 3 

where 


v^(n)  is  the  maximum  expected  number  of  pollution  detections  over  the 
remaining  n time  units  given  the  process  starts  in  state  i. 

h (m)  is  the  probability  that  m time  units  will  be  required  to  go 

from  state  i to  state  j when  it  is  desired  to  transition  to  state  a. 

Note  that  the  state  designations  i and  j now  refer  to  the  redefined 

2-dimensional  states  of  the  semi-Markov  model.  For  this  particular  problem, 

the  inner  sum  over  the  transition  time  probabilities  in  (1)  has  only  one  term  since 


the  transition  (flight)  time  between  states  is  assumed  to  be  deterministic.  Hence, 


there  is  a matrix  of  transition  times  from  state  i to  state  j (t 
than  one  of  functions  of  transition  times  from  state  i to  state 
Equation  (1)  simplifies  to  the  following  form: 


ij 

j 


),  rather 

(hij(m)). 
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Table  1.  Computation  Times  for  Linear  Program*** 


Length  of 
flight 

Number  of 

Computation  Time* 

(minutes) 

Markov  States 

(Seconds) 

** 

66 

21. 

120 

341 

552. 

420 

1565 

7815. 

[*  CPU  Seconds  on  CDC  6000  Series  Computer  [5]] 

[**  Not  able  to  determine] 

[***  Linear  Programs  run  using  the  CDC  OPTIMA  System] 
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v (0)  = -<*>,  i / the  final  (or  home)  state, 

v^(0)  = 0 , i = the  final  (or  home)  state,  and 

v^(n)  = -<*>,  for  negative  values  of  n 


The  procedure  is  terminated  when  v^(M)  is  calculated,  where  I is  the  initial 
(or  beginning)  state  and  M is  the  length  of  the  flight. 

It  should  be  noted  that  the  semi-Markov  formulation  is  equivalent  to  that 
of  Olson,  et  al. , and  the  state  reduction  techniques  discussed  in  [5,  6]  apply 
to  this  formulation.  Computational  advantages  accrue  from  the  significant 
reduction  in  the  number  of  states  in  the  system  and  the  finite  horizon  dynamic 
programming  approach  (as  opposed  to  the  infinite  horizon  LP  approach). 

Equivalence  of  size,  for  comparison  of  computational  efficiency  of  the 
semi-Markov  and  Markov  formulations,  is  easily  established. 

Certain  entries  of  the  v^(n)  matrix  can  be  determined  to  be  infeasible  (i.e.,  it 
is  either  impossible  to  reach  state  i from  the  start  point  in  n time  units,  or 
it  is  impossible  to  reach  the  finish  point  from  state  i in  the  flight  time  re- 
maining). In  addition,  it  is  possible  to  limit,  artificially,  the  time  window 
within  which  each  sector  can  be  visited.  For  instance,  due  to  operational  con- 
siderations associated  with  a given  data  set  to  be  used  as  input  for  the  dynamic 
program,  it  may  be  apparent  that  a certain  geographic  sector  can  only  be  visited 
early  in  the  flight,  if  at  all.  It  would  make  sense,  then,  to  declare  those 
entries  of  v^(n)  associated  with  the  geographic  sector  to  be  infeasible  for  time 
periods  (n)  greater  than  the  latest  reasonable  visitation  time.  The  remaining 
(feasible)  elements  of  v^(n)  each  represent  states  of  the  system  for  the  Markov 
formulation.  Therefore,  counting  the  feasible  entries  of  the  v^(n)  matrix  gives 
the  equivalent  number  of  states  of  the  Markov  problem  for  a given  semi-Markov 
problem. 
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The  data  used  by  Olson  et  al.,  was  not  available.  Therefore,  a problem 
was  formulated  using  realistic  data  from  [7].  The  sectors  considered  were  po- 


tential oil  well  drilling  sites  and  shipping  lanes  in  the  Gulf  of  Mexico  off  the 
Florida  coast.  The  aircraft  was  assumed  to  fly  at  a speed  of  130  knots.  Flight 
patterns  were  designed  for  each  sector  and  the  detection  probabilities  were 
randomly  generated.  There  were  35  sectors  which,  when  combined  with  the  various 

flight  patterns,  resulted  in  111  semi-Markov  states.  Time  was  discretized, 
as  in  [5,  6],  in  minutes.  The  equivalent  number  of  Markov  states  de- 

pended on  the  length  of  the  flight  and  the  extent  to  which  various  state  re- 
duction techniques  [5,  6]  were  applied.  The  computational  results  for  the 
various  runs  made  are  displayed  in  Table  2,  In  addition,  the  computational 
results  are  included  in  Figure  2 for  comparison  with  the  linear  programming 
results.  (Note  that  figure  2 is  a semi-log  graph). 

The  relative  difference  in  computation  times  is  quite  large.  It  might 
be  argued  that  some  of  the  difference  can  be  attributed  to  the  relative 
speeds  of  the  CDC  6000  series  computer  used  for  the  Linear  Program  and  the 
IBM-370  model  165  computer  used  for  the  Dynamic  Program,  but  the  computational 
differences  (more  than  4 orders  of  magnitude)  are  large  enough  to  absorb 
easily  any  differences  in  machine  speed.  For  the  problem  at  hand,  the  power 
of  the  discrete  time  semi-Markov  process  as  a modelling  tool  is  that  it  brings 
forth  the  underlying  structure  in  a more  straightforward  fashion.  This  allows  a 
simplified  computational  approach  to  the  optimization  and  results  in  the 
computational  efficiencies  observed. 

The  computer  storage  requirements  for  the  experimental  dynamic  program  code 
are  reasonably  modest,  all  things  considered.  The  6 hr.  (360  minute)  flight 
problem  (the  largest  we  ran)  requires  less  than  260K  bytes  of  core.  No  additional 
off-line  storage  is  necessary.  Rather  large  efficiencies  ('50%)  could  be 


affected  through  list  processing  to  eliminate  storage  requirements  for 
infeasible  entries  of  v^(n),  but  that  was  not  implemented  in  the  code. 
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Table  2.  Computation  Times  for  Dynamic  Program** 


Number  of  Semi- 
Markov  States 

Length  of 
flight 
(minutes) 

Equivalent  Number 
of  Markov  States 

Computation  Time* 
(Seconds) 

111 

180 

1076 

.396 

111 

180 

2914 

1.151 

111 

360 

4120 

1.694 

111 

360 

7303 

2.626 

111 

360 

14592 

7.125 

[*CPU  Seconds  on  IBM  370-165  computer] 
[**  Dynamic  Program  coded  in  FORTRAN  IV] 
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